Cap max RetryableAction wait time/timeout.#74940
Cap max RetryableAction wait time/timeout.#74940henningandersen merged 4 commits intoelastic:masterfrom
Conversation
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes elastic#70996
|
Pinging @elastic/ml-core (Team:ML) |
|
Pinging @elastic/es-distributed (Team:Distributed) |
davidkyle
left a comment
There was a problem hiding this comment.
ML changes LGTM
Consider renaming the calculateDelay(long) function to calculateMaxDelay(long) or calculateMaxDelayBound(long) as the logic to determine the actual delay passed to threadPool.schedule is in the onFailure method.
...n/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/ResultsPersisterService.java
Outdated
Show resolved
Hide resolved
|
Will this PR fix timeouts like this one? https://gradle-enterprise.elastic.co/s/qvkqnuh5ocffw |
I am afraid not, the symptoms of the failing |
…s/persistence/ResultsPersisterService.java Co-authored-by: David Kyle <david.kyle@elastic.co>
💔 Backport failed
To backport manually run: |
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes elastic#70996
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes #70996
RetryableAction uses randomized and exponential back off. If unlucky, the randomization would cause a series of very short waits, which would double the bound every time, risking a subsequent very long wait. Now randomize between [bound/2, bound[. Closes #70996
RetryableAction uses randomized and exponential back off. If unlucky,
the randomization would cause a series of very short waits, which would
double the bound every time, risking a subsequent very long wait. Now
randomize between [bound/2, bound[.
Closes #70996
This fixes
testAckedIndexingbecause it ran into a final very long wait time for one of the retries.I am not necessarily fixed on this solution, though it seems like a step in the right direction. An
alternative could be to cap the final wait time to meet the timeout, but it does add more deterministic/less random
behavior and the solution here should be good enough.